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Abstract —A multi-access wireless network with N transmitting 
nodes, each equipped with an energy harvesting (EH) device and 
a rechargeable battery of finite capacity, is studied. At each time 
slot (TS) a node is operative with a certain probability, which may 
depend on the availability of data, or the state of its channel. The 
energy arrival process at each node is modelled as an independent 
two-state Markov process, such that, at each TS, a node either 
harvests one unit of energy, or none. At each TS a subset of 
the nodes is scheduled by the access point (AP). The scheduling 
policy that maximises the total throughput is studied assuming 
that the AP does not know the states of either the EH processes 
or the batteries. The problem is identified as a restless multi- 
armed bandit (RMAB) problem, and an upper bound on the 
optimal scheduling policy is found. Under certain assumptions 
regarding the EH processes and the battery sizes, the optimality 
of the myopic policy (MP) is proven. For the general case, the 
performance of MP is compared numerically to the upper bound. 

Index Terms —Energy harvesting, myopic policy, multi-access, 
online scheduling, partially observable Markov decision process, 
restless multi-armed bandit problem. 

I. Introduction 

Low-power wireless networks, such as machine-to-machine 
and wireless sensor networks, can be complemented with 
energy harvesting (EH) technology to extend the network 
lifetime. A low-power wireless node has a limited lifetime 
constrained by the battery size; but when complemented with 
an EH device and a rechargeable battery, its lifetime can be 
prolonged significantly. However, energy availability at the 
EH nodes is scarce, and, due to the random nature of the 
energy sources, energy arrives at random times and in arbitrary 
amounts. Hence, in order to take the most out of the scarce 
energy, it is important to optimise the scheduling policy of the 
wireless network. 

Previous research on EH wireless networks can be grouped 
into three, based on the information available regarding the 
random processes governing the system m. In the offline 
optimization framework, availability of non-causal information 
on the exact realizations of the random processes governing 
the system is assumed at the transmitter m, 0. In the on¬ 
line optimization framework a-na, the statistics governing 
the random processes are assumed to be available at the 
transmitter, and their realizations are known only causally. 
The EH communication system is modeled as a Markov 
decision process (MDP) a, or as a partially observable MDP 


(POMDP) 0, and dynamic programming (DP) ifT^ can be 
used to optimise the EH communication system numerically. 
In many practical applications, the state space of the cor¬ 
responding MDPs and POMDPs is large, and DP becomes 
computationally prohibitive ifT^ . and the numerical results of 
DP do not provide much intuition about the structure of the 
optimal scheduling policy. In order to avoid complex numeri¬ 
cal optimisations it is important to characterize the behaviour 
of the optimal scheduling policy and identify properties about 
its structure; however, this is possible only in some special 
cases 0, 0, 0. In the learning optimization framework, 
the knowledge about the system behaviour is further relaxed, 
and even the statistical knowledge about the random processes 
governing the system is not assumed, and the optimal policy 
scheduling is learnt over time im. 

We study online scheduling of low-power wireless nodes 
by an access point (AP). The nodes are equipped with EH 
devices, and powered by rechargeable batteries. At each time 
slot (TS) a node is operative with a certain probability, which 
may depend on the channel conditions or the availability of 
data at the node. The EH process at each node is modelled as 
an independent Markov process, and at each TS, a node either 
harvests one unit of energy or does not harvest any. The AP 
is in charge of scheduling, at each TS, the EH nodes to the 
available orthogonal channels. A node transmits only when it 
is scheduled and is operative at the same time. Hence, at each 
TS the AP learns the EH process states and battery levels of 
the operative nodes that are scheduled, but does not receive 
any information about the other nodes. The AP is interested in 
maximising the expected sum throughput within a given time 
horizon. This problem can be model as a POMDP and solved 
numerically using DP at the expense of a high-computational 
cost. Instead, we model it as a restless multi-armed bandit 
(RMAB) problem Ql, and prove the optimality of a low- 
complexity policy in two special cases. Moreover, by relaxing 
the constraint on the number of nodes that the AP can schedule 
at each TS, we obtain an upper bound on the performance of 
the optimal scheduling policy. Einally, the performance of the 
low complexity policy is compared to that of the upper bound 
numerically. The main technical contributions of the paper are 
summarised as follows: 

• We show the optimality of a MP if the nodes do not 
harvest energy and transmit data at the same time, and 
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the EH process is affected by the scheduling policy. 

• We show the optimality of MP if the nodes do not have 
batteries and can transmit only if they have harvested 
energy in the previous TS. 

• We provide an upper bound on the performance for the 
general case by relaxing the constraint on the number of 
nodes that can be scheduled at each TS. 

• We show numerically that MP performs close to the upper 
bound for the general case. 

The rest of this paper is organized as follows. Section HIl is 
dedicated to a summary of the related literature. In Section HIH 
we present the EH wireless multi-access network model. In 
Sections |IV] and [V] we characterize explicitly the structure 
of the optimal policy that maximises the sum throughput 
for two special cases. In Section |Vl] we provide an upper 
bound on the performance. Finally, in Section lVlIl we compare 
the performance of MP with that of the upperbound through 
numerical analysis. Section fVIIIl concludes the paper. 

H. Related Work 

There is a growing research interest in EH wireless commu¬ 
nication systems, and in particular, in developing scheduling 
policies that exploit the scarce harvested energy in the most ef¬ 
ficient manner. In large EH wireless networks, since numerical 
optimization is computationally prohibitive, it is important to 
characterise the optimal scheduling policy explicitly, or certain 
properties of it. 

In 161, the authors assume that the data packets arrive at the 
EH transmitter as a Poisson process, and each packet has an 
intrinsic value assigned to it, which also is a random variable. 
The optimal transmission policy that maximizes the average 
value of the received packets at the destination is proven to be 
a threshold policy. However, the values of the thresholds have 
to be computed using numerical techniques, such as DP or 
linear programming (LP). Reference Q extends the problem 
in 0 to the multi-access scenario. 

Multi-access in EH wireless networks with a central sched¬ 
uler, static channels and backlogged nodes has been studied 
in ii-illoi. The central scheduler in 0 does not know the 
battery levels or the states of the EH processes at the nodes. 
Assuming that the nodes have unit size batteries, the system is 
modeled as an RMAB, and MP, which has a round robin (RR) 
structure, is shown to maximise the sum throughput. Reference 
0 considers nodes with batteries of arbitrary capacity, and MP 
is found to be optimal in two special cases. In contrast to the 
present paper, 11 considers static channels and backlogged 
nodes, and the optimality proof exploits the RR structure 
of MP. In IfTOl . considering infinite-capacity batteries, an 
asymptotically optimal policy is proposed. 

The problem studied in this paper is modeled as an RMAB 
problem. In the classic RMAB problem there are several arms, 
each of which is modelled as a Markov chain m. The states 
of the arms are unknown, and at each TS an arm is played. 
The played arm reveals its state and yields a reward, which 
is a function of the state. The objective is to find a policy 
that maximises the total reward over time. RMAB problems 
have been shown to be, in general, PSPACE hard d, and our 


knowledge on the structure of the optimal policy for general 
RMAB problem is limited. 

Recently, the RMAB model has been used to study channel 
access and cognitive radio problems, and new results on the 
optimality of MP have been obtained ifThl - l^ . The structure 
and the optimality of MP is proven in m and El for single 
and multiple plays, respectively, under certain conditions on 
the Markov transition probabilities. In ifTSll the optimality of 
MP is shown for a general class of monotone affine reward 
functions, which include arms with arbitrary number of states. 
The optimality of MP is proven in ifT^ when the arms’ states 
follow non-identical Markov chains. The case of imperfect 
channel detection is studied in ||20l, and MP is found to be 
optimal when the false alarm probability of the channel state 
detector is below a certain value. 

HI. System Model 

We consider an EH wireless network with N EH nodes and 
one AP, as depicted in Figure [T] Time is divided into TSs of 
constant duration, and the AP is in charge of scheduling K of 
the N nodes to the K available orthogonal channels at each 
TS. A node is operative at each TS with a fixed probability p 
independent over TSs and nodes, and inoperative otherwise. 
We consider that a node is in the operative state if it has a 
data packet to transmit in its buffer and its channel to the AP 
is in a good state, while it is inoperative otherwise even if it 
is scheduled to a channel. The EH process is modelled as a 
Markov chain, which can be either in the harvesting or in the 
non-harvesting state, denoted by states 1 and 0, respectively. 
We denote by pij the transition probability from state i to j, 
and assume that pn > poi. that is, the EH process is positively 
correlated in time, and hence, if the EH process is in state i, 
it is more likely to remain in state i than switching to the 
other state. We denote by Ef{n) and E!l'{n) the state of the 
EH process and the amount of energy harvested by node i, 
respectively, in TS n. The energy harvested in TS n is available 
for transmission in TS n+1. We assume that one fundamental 
unit of energy is harvested when the Markov process makes a 
transition to the harvesting state, that is, E^{n) = E'f (n-l- 10 
Each node is equipped with a battery of capacity B, and we 
denote by Bi{n) G {0 ,... ,B} the amount of energy stored 
in the battery of node i at the beginning of TS n. The state of 
node i in TS n, Si{n), is given by its battery and EH process 
states, Si{n) = {E-(n), Bi{n)) G {0,1} x {0, ...,5}. The 
system state is characterized by the joint states of all the nodes. 

The system functions as follows: At the beginning of each 
TS, the AP schedules K out of N nodes, such that a single 
node is allocated to each orthogonal channel. When a node 
is scheduled, if it is operative in that TS, i.e., it has data to 
transmit and its channel is in a good state, it transmits a data 
packet as well as the current state of its EH process to the AP. 
If it is not operative it transmits a status beacon to the AP, 

’ Our results can be generalised to a broader class of two-state Markovian 
EH processes in which the amount of energy harvested in each state is an 
independent and identically distributed random variable, and the expected 
amount of haiwested energy in the harvesting state is larger than that in the 
non-harvesting state. However, the studied EH model captures the random 
nature of the energy arrivals, and is also considered in 0,0, (a, [0 
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Figure 1 . System model with N EH nodes with finite size batteries and K 
orthogonal channels. 


and backs off. We say that a node is active in a TS if it is 
scheduled by the AP and is operative; and hence, it transmits 
a data packet to the AP, otherwise we say that the node is idle 
in this TS, that is, the node is not scheduled or it is scheduled, 
but it is not operative. We denote by lC{n) and IC°‘{n) the set 
of nodes scheduled by the AP, and the set of active nodes in 
TS n, respectively, where lC°‘{n) C K.{n). 

We assume that the transmission rate is a linear function 
of the transmit power, which is an accurate approximation in 
the low power regime. When the power-rate function is linear, 
the total number of bits transmitted to the AP is maximised 
when an active node transmits at a constant power throughout 
the TS, using all its energy. To simplify the notation we 
normalise the power-rate function such that the number of 
bits transmitted within a TS is equal to the energy used for 
transmission. Then the expected throughput in TS n is 


R{IC{n)) = E 


2G/C“(n) 


= p^S,(n). 


( 1 ) 


The objective of the AP is to schedule the best set of nodes, 
K.{n), at each TS in order to maximize (dJ, without knowing 
which nodes are operative, the battery levels, or the EH states. 
The only information the AP receives is the EH state of the 
active nodes at each TS. Note that the AP also knows the 
battery state of the active nodes after transmission since they 
use all their energy. 

A scheduling policy is an algorithm that schedules nodes 
at each TS n, based on the previous observations of the EH 
states and battery levels. The objective of the AP is to find the 
scheduling policy IC{n), Vn G [liE], that maximizes the total 
discounted throughput, given by 


s.t. Bi(n -I- 1) = mm{Bi{n) 

+ + E^{n) ■ ]ligx:“(n)i 


( 2 ) 


where 0 < /3 < 1 is the discount factor, and Iq is the indicator 
function, defined as = 1 if a is true, and = 0, otherwise. 

If the AP is informed on the current state of all the nodes 
at each TS, the problem would be formulated as an MDP, and 
solved using LP or DP ifT^ . However, in practice transmitting 
all the nodes’ states to the AP introduces further overhead 
and energy consumption; and hence, is not considered here. 
Accordingly, the appropriate model for our problem is a 


POMDP. It can be shown that a sufficient statistic for optimal 
decision making in a POMDP is given by the conditional 
probability of the system states given all the past actions 
and observations, which, in our problem, depends only on 
the number of TSs each node has been idle for, and on the 
realisation of each node’s EH state last time it was active. 
Hence, we can reformulate the POMDP into an equivalent 
MDP with an extended state space. The belief states, that is, 
the states in the equivalent MDP, are characterized by all the 
past actions and observations. We denote by k and hi the 
number of TSs that node i has been idle for, and the state 
of the EH process the last time it was active, respectively. 
The belief state of node i, Si(n), is given by Si(n) = {k, hi), 
and the belief state of the whole system is the joint belief 
states of all the nodes. In TS n, the belief state of node i 
is updated as Si{n -f 1) = {0,E^{n)), if i G and as 

Si{n -f 1) = {k + 1, hi), otherwise. That is, at each TS, h is 
set to 0 if node i is active, and increased by one if it is idle. 
In principle, since the number of TSs a node can be idle is 
unbounded, the state space of the equivalent MDP is infinite, 
and hence, the POMDP in (|2]i is hard to solve numerically. 
In Sections |IV] and |V] we focus on two particular settings, 
and show the existence of optimal low-complexity scheduling 
policies under certain assumptions. 

IV. Non Simultaneous Energy Harvesting and 
Data Transmission 

In this section we assume that the nodes are not able to 
harvest energy and transmit data simultaneously, and that if 
node i is active in TS n — 1, then its EH state in TS n, Ef(n), 
is either 0 or 1 with probabilities eo and ei, respectively, 
independent of the EH state in TS n — 1, where eg < pof+pm ' 
These assumptions may account for nodes equipped with elec¬ 
tromagnetic energy harvesters in which the same antenna is 
used for harvesting as well as transmission; and hence, it is not 
possible to transmit data and harvest energy simultaneously, 
and the RE hardware has to be reset into the harvesting mode 
after each transmission. 

Since the EH process is reset when a node transmits, the 
EH process states of active nodes are not relevant. As a 
consequence, the belief state of a node, Si{n), is characterized 
only by the number of TSs the node has been idle for, li. 
There is a one-to-one correspondence between k and the 
expected battery level of node *; therefore, we redefine the 
belief state, Si(n), as the expected battery level of node i 
in TS n, normalised by the battery capacity. The expected 
throughput in ([T]i can be rewritten as 

R{K.{n)) =pB'^Si{n). (3) 

Notice that Si{n) in Q is normalised, i.e., Si{n) G [0,1]. 

Due to the Markovity of the EH processes, the future belief 
state is only a function of the current belief state and the 
scheduling policy. If a node is active in TS n, since it uses all 
its energy and does not harvest any, the belief state is set to 
0 in TS n + l. If a node is not active in TS n, then the belief 
state evolves according to the belief state transition function 
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r( ). The belief state of node i in TS n + 1 is 

Si{n + 1) = 


T{si{n)) 

0 


if * ^ }C°‘{n), 
if * S /C“(n). 


(4) 


Property 1. The belief state transition function, r(-), is a 
monotonically increasing contracting map, that is, T{si{n)) > 
T{sj{n)) if Si{n) > Sjin), and ||T(si(n)) — 'r(sj(n))|| < 
||si(n) - Sj(n)||. 


Proof: The proof is given in Appendix ■ 

Note that the assumption pn > poi is a necessary condition 
for Property 1. We denote by s(n) = (si(n),..., SAr(n)) the 
belief vector in TS n, which contains the belief states of all 
the nodes, and by sg (n) the belief vector of the nodes in set 
8. For the sake of clarity we drop the n from s(n) and S£(n) 
when the time index is clear from the context. We denote the 
expected throughput by R{s£) if the belief vector is s and 
nodes in £ are scheduled. 

The probability that a particular set of nodes, KP{n) C 
K.{n), is active while the rest of the scheduled nodes remain 
idle in TS n is a function of the cardinality of (n) and the 
probability that a node is operative, p. For a = |/C“(n)| we 
denote this probability by 


g(a,iT)^(l-p)^-V- (5) 


The AP is interested in finding the scheduling policy tt, 
which schedules the nodes according to s(n), that is K,{n) = 
7r(s(n)), such that the expected throughput over the time 
horizon T is maximised. The associated optimization problem 
is expressed through the Bellman value functions, 

V:{s) = i?(s^(,)) + /3^g(|f|,/T) 

fC77(s) .gs 

xy„’"+i((si(n + l),...,Sj(n + l) =0, 

• ■ .,Si{n + l) = T{si{n )),...)), 

where the sum is over all possible sets of active nodes, £, 
among the scheduled nodes, /C(n) = 7r(s(n)), and nodes j 
and i are active and idle, respectively. The optimal policy, tt*, 
is the one that maximises (|6]l. 


A. Definitions 

Definition 1. At TS n the myopic policy (MP) schedules 
the K nodes that maximise the expected instantaneous reward 
function, i?( ). For the reward function in (|3ll the MP schedules 
the K nodes with the highest belief states. 


MP schedules the nodes similarly to a round robin (RR) 
policy that orders the nodes according to the time they have 
been idle for, and at each TS schedules the nodes with the 
highest idle time values. If a node is active in this TS, it is 
sent to the bottom of this ordered list in the next TS. If a node 
is idle it moves forward in the order. Notice that due to the 
monotonicity of r(-) the order of the idle nodes is preserved. 

We denote by sn = (sn(i), • ■ •, sn(Ar))j the permutation of 
the vector s, where !!(•) is a permutation function, by = 
(50(1)! • ■ ■! Syi(k]) the vector containing the first K elements 
of Sn, and by = { 11 ( 1 ), • • • , n(iT)} the set of indices of 
the nodes in positions from 1 to AT in vector sn. We say that 


a vector is ordered if its elements are in decreasing order. We 

o 

denote by 11 the permutation that orders a vector, that is, the 

vector So is ordered, i.e., so s° 3°, We 

n ’ n(i) — n( 2 ) — — n(Ar) 

denote the vector operator that first orders the vector S£ of \£\ 
components, and then applies r(-) to each of the components 
of the resulting vector by T(s£) = 

with II(i) & 8,1 < i < \8\. Note that due to the monotonicity 
of t( ) the vector T(sf) is always ordered. Finally, we denote 
the zero vector of length k by 0(fc). 

Definition 2. Pseudo value function, W„(sn), is defined as 

VF„(sn) ^ i?(s{{) +/3^g(|f|,iT)VF„+i([T(s^) ,0(|f|)]), 
SdS^ 

VFT(sn) = 

(7) 

where [•, •] is the vector concatenation operator. 

Wn(-) is characterized solely by the belief vector s and its 
initial permutation 11. In TS n, the first K nodes according 
to permutation 11 are scheduled, and the nodes are scheduled 
according to MP thereafter. The belief vector in TS n + 1 
is + 1) = [T(sg),Od^l)], where £ is the set of active 
nodes in TS n, and, since T( ) implicitly orders the output 
vector, s^(n+ 1) is ordered. Hence, the nodes that are active 
in TS n have belief state 0 in TS n + 1, and are moved to 
the rightmost position in the belief vector. If vector sn is 
ordered, (|7]i corresponds to the value function of MP, that is, 
corresponds to (|6]l where tt is MP. 

Definition 3. A permutation 11 is an i,j-swap of permutation 
n if n(A:) = n(A:), for V/c f- {i,j}, and n(j) = n(i) and 
n(i) = n(j). That is, all the nodes but those in positions i 
and j are in the same positions in sn and Sj=f, and the nodes 
in positions i and j are swapped. 

A permutation H is an i,j-swap if n(A:) = k, for \/k of 
and n(i) = j and n(j) = i. That is, all the nodes but 
those in positions i and j are in the same position in s and 
Sn, and the nodes in positions i and j are swapped. 

Definition 4. A function /(x), / : IR^ ^ IR and x = 
{xi,... ,Xk), is said to be regular if it is symmetric, mono¬ 
tonically increasing, and decomposable ifT^ . 

• /(x) is symmetric if f{...,Xi,...,Xj,...) = 

f{...,Xj,...,Xi,...). 

• /(x) is monotonically increasing in each of its com¬ 
ponents, that is, if Xj < Xj then f{...,Xj,...) < 

/(..., ij,...). 

• f(x) is decomposable if f(...,Xn,...) = 

Xjfi...,l,...) + {1-Xj)fi...,0,...). 

Definition 5. (Boundedness) A function /(x), / : IR^ —IR 
and X = {xi,...,Xk), is said to be bounded if A; < 
/(...,1,...)-/(...,0,...)<A,. 

We note that the expected throughput R{-) is a linear 
function of the belief vector, which has bounded elements, 
and all the nodes that are scheduled have the same coefficient; 
hence, i?( ) is a bounded regular function. The pseudo value 
function, Wn(-), is symmetric, that is, 

^"^(sn) = W'n(sn), 


( 8 ) 
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where 11 is a i,j-swap permutation of If, and j,i < K or 
j,i > K. To see this we can use the symmetry of R{-), and 
the fact that T(-) orders the belief vector in decreasing order. 

B. Proof of the optimality of MP 

We prove the optimality of MP under the assur^tions that 
t( ) is a monotonically increasing contracting majo and i?( ) 
is a bounded regular function. Hence, the results in this section 
can be applied to a boarder class of EH processes and reward 
functions than those studied in this paper. 

The proof is structured as follows: Lemma[T]gives sufficient 
conditions for the optimality of MP in TS n, given that MP is 
optimal from TS n +1 onwards. In Lemma|2]we show that the 
difference between the pseudo value functions of two different 
vectors is bounded. In particular, we bound the difference 
between the value functions of two belief vectors and s^, 
which are both ordered, and differ only for the belief state of 
node i. In Lemma |3] we show that, under certain conditions, 
the sufficient conditions for the optimality of MP given in 
Lemma □ hold. 

Lemma 1. Assume that MP is optimal from TS n + 1 until TS 
T. A sufficient condition for the optimality of MP in TS n is 

PLn(s) > IL„(sn), (9) 

for any H that is an i^j-swap, with Sj > Si and j < i. 

Proof: To prove that a policy is optimal, we need to show 
that it maximizes I©. By assumption MP is optimal from 
TS n + 1 onwards; and hence, it is only necessary to prove 
that scheduling any set of nodes and following MP thereafter 
is no better than following MP directly in TS n. The value 
function corresponding to the latter policy is W,i([sc),s^]), 
where so contains the K nodes with the highest belief states 
in s, and contains the rest of the nodes not necessarily 
ordered. The value function corresponding to the former policy 
is Wn(\su,s-jj\), where su contains the K nodes scheduled 
in TS n, and is the set of the remaining nodes. There 
exist at least a pair of nodes Si and Sj such that, j € U and 
j ^ O, i € U and i ^ O, and Sj > Si. By swapping each 
pair of such nodes, that is, swapping j &U for i can 

obtain Wn ([so, s^]) from Wn ([s^^, s^]) through a cascade of 
inequalities using (|9]l. Accordingly, W„([so,s^]) is an upper 
bound for any Wn{\suT^u\), ™d, hence, MP is optimal. ■ 

Lemma [U shows that, under certain conditions, the opti¬ 
mality of MP can be established through the pseudo value 
function. In particular, under the conditions of Lemma d] if 
swapping a node in the belief vector with another node with a 
lower position and a lower belief state does not decrease the 
pseudo value function, then MP is optimal. 

Lemma 2. Consider a pair of belief vectors s and s, which 
differ only in one element, that is, Si = Si for Vi f j and Sj > 

^Our results can also be applied to the case in which the state transition 
function is a monotonically increasing contracting map with parameter a, that 
is, T{si(n)) > T{sj(n)) if Si(n) > Sj{n), and ||T(si(n)) — T(sy(n))|| < 
a||si(n) — if 0 < o ■ /3 < 1. 


Sj. If R{-) is a bounded regular function, r(-) a tnonotonically 
increasing contracting map, and /3 < 1, then we have 

< A„(sj - Sj)u{n), (10) 

T-n 

where u{n) = ^ (/3(1 — p))*- 

i=0 

Proof: See Appendix IbI ■ 

The result of Lemma |2] establishes that increasing the belief 
state of a node j from Sj to Sj may increase the value of the 
pseudo value function, which is bounded by a linear function 
of the increase in the belief, Sj — Sj, and the function u{n), 
which decreases with n and corresponds to the maximum 
accumulated loss from TS n to TS T. 

Lemma 3. Consider two belief vectors s and sn, such that 
permutation H is an i,j-swap, and Sj > Si for some j < i. 
If R(-) is a bounded regular function, r(-) a monotonically 
increasing contracting map, and (3 <1, then 

1 — (dll — 

W'n(s) - IL„(sn) >0if Ai> A^/3p ■ 

1 - ^(1 -p) 

( 11 ) 

Proof: See Appendix ICl ■ 

Theorem 1. If Rf) is a bounded regular function, t(-) a 
monotonically increasing contracting map, /3 < I, and A; > 
AuPp'^—^z^^YZ ^—> optimal policy. 

Proof: The proof is done by backward induction. We have 
already shown that MP is optimal at TS T. Then we assume 
that MP is optimal from TS n + 1 until TS T, and we need to 
show that MP is optimal at TS n. To show that MP is optimal 
at TS n, using Lemma[T] we only need to show that (|9]) holds. 
This is proven in Lemma □ which completes the proof. ■ 
The result of Theorem[T]holds for any R{-) that is a bounded 
regular function. The reward function studied here, i.e., the 
sum expected throughput in Q, is a bounded regular function, 
and we have A„ = A; = pB. Linally, we state the optimality 
of MP for the EH problem studied in this section. 

Theorem 2. For the reward function R{-) defined in di]), if the 
transition probabilities satisfy pn > poi ond eg < . 

then MP is the optimal policy. 

V. Simultaneous Energy Harvesting and Data 
Transmission with Batteryless Nodes 

Now we consider another special case of the system model 
introduced in Section [111] We assume that the nodes cannot 
store energy, and the harvested energy is lost if not used 
immediately. This might apply to low-cost batteryless nodes. 
Energy available for transmission in TS n is equal to the 
energy harvested in TS n — 1, that is, Bfin) = E^{n — 1). 
We denote by Si (n) the belief state of node i at TS n, which 
is the expected energy available for transmission, that is, the 
probability that the node is in the harvesting state. The belief 
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State transition probabilities are 

( T{si{n)) ifi^/C“(n), 

Si(n + 1)=< pii if * e/C“(n) w.p. Si(n), 

[ Poi if i € /C“(n) w.p. 1 - Si(n), 

( 12 ) 

where r(s) = (pn — poi)s +poi> and since pn < poi, it is 
a monotonically increasing affine function. This implies that 
if Si > Sj then T{si) > T{sj), that is, the order of the idle 
nodes is preserved. We note that i G /C“(n) with probability 
p, if i € IC{n). The problem is to find a scheduling policy, 
K.{n), such that the expected discounted sum throughput is 
maximised over a time horizon T. 

We define the pseudo value function as follows 

Ill'll _ 

xWr,+i (Pii (E(£) ,t(sj),Poi {Us)) , 
IFT(sn) = R{s^), 

(13) 

where we denote the set of active nodes by £ and the ith 
active node by £{i). We define Is = {kii), ■ ■ ■ ,lsi\s\)), 
such that ls{i) = 1 if the EH process of the corresponding 
node is in the harvesting state, and ls(^i) = 0 otherwise. We 
define the function h{ls, K) = q{\£\,K) P[ (1 — 

j&£ 

where q{\£\,K) is defined in (|5ll. We denote by Poi(a) 
and Pii(a) the vectors (poi, • ■ • ,Poi) and (pn,... ,pii), 
respectively, of length a, and we define Us = h, and 

iG£ 

Us ^ \£\ — The operator t(-) applies the mapping r(-) 

i^£ 

to all its components. The pseudo value function schedules the 
nodes according to permutation H, and if sn is ordered, then 
(fljl l is the value function of MR 

Swapping the order of two scheduled nodes does not change 
the value of the pseudo value function, that is, the pseudo value 
function is symmetric. This property is similar to that in 
but only for i,j < K. Similarly to m and ini, the mapping 
r( ) is linear, and hence, the pseudo value function is affine 
in each of its elements. This implies that, if H is an i, j-swap 
of n, then 

Wn (sn) - lT„(sjj) 

= (sn(j) - sn(i))(lT'„(..., sn(j) = 1, • ■ •, sn(i) = 0,...) 

— Wn{- ■ ■ , sn(i) = 0, ■ ■ ■, sn(i) = 1, • ■ •)) • (14) 

MR schedules the nodes whose EH processes are more 
likely to be in the harvesting state. Initially, nodes are ordered 
according to an initial belief. If a node is active, it is sent to 
the first position of the queue if it is in the harvesting state, 
and to the last position if it is in the non-harvesting state. 
The idle nodes are moved forward in the queue. Due to the 
monotonicity of r(-), MR continues scheduling a node until it 
is active and its EH process is in the non-harvesting state. 

A. Proof of the optimality of MP 

We note that the result of Lemma [T] is applicable in this 
case. If Lemma 0] holds, the same arguments as in Theorem [T] 


can be used to prove the optimality of MR. 

Lemma 4. Let II be an j-swap, and consider a permutation 
n, such that n(fc) = k — 1 , for Vfc ^ 1 and n(l) = N. If 
Sj ^ Si for some j < i, then we have the inequalities 

l + RR„(sf 5 )>lT„(s), (15a) 

W'„(s)>W'„(sn). (15b) 

Proof: The proof follows from the similar arguments as 

in El. In particular, we use backward induction in (I15al) and 
(fBbli . and a sample-path argument. A sketch of the proof is 
provided in Appendix |D] ■ 

Note that (I15al i and (I15bl) are similar to (fTOl) and (fTTT) . 
respectively. 

Theorem 3. If the reward function is R{IC{n)) = p Si{n), 

iGK,{n) 

and pii > poi, MP is the optimal policy. 

Proof: Theorem [3 can be proven by using the same 
arguments as in Theorem [T] and Lemmas [T] and 0] ■ 

Remark 1. This problem is similar to the opportunistic multi¬ 
channel access problem studied in lfT6l - lfT9l . with imperfect 
channel sensing, such that, at each attempt, a channel can not 
be sensed with probability 1 — p, independent of its channel 
state. While the MR has been proven to be optimal in the case 
of perfect channel sensing, i.e., p = 1, El, the case with 
sensing errors, i.e., p ^ 1, has not been considered in the 
literature. We also note that this model of imperfect channel 
detection is different from that in ll20l . 

Remark 2. Using similar techniques as in im the MR 
optimality results of Sections HVl and [V] can be extended from 
the finite horizon discounted reward criteria to the infinite 
horizon with discounted reward, and to the infinite horizon 
with average reward criteria. 

VI. Upper Bound on the Rerformance of the 
Optimal Scheduling Rolicy 

Next we derive an upper bound on the performance of the 
optimal policy for the general model in Section III under the 
average reward criteria and infinite time horizon. The RMAB 
problem with an infinite horizon discounted reward criteria is 
studied in ED, and it is shown that an upper bound can be 
computed in polynomial time using LR. 

The decision of scheduling a node in a TS affects the 
scheduling of the other nodes in the same TS, since exactly K 
nodes have to be scheduled at each TS. Whittle El proposed 
to relax the original problem constraint, and impose instead 
that the number of nodes that are scheduled at each TS is 
K on average. In the relaxed problem, since the nodes are 
symmetric, one can decouple the original RMAB problem into 
N RMAB problems, one for each node. As before, we denote 
by s = ((, /i) G W the belief state of a node, where I is the 
number of TSs the node has been idle for, and h the EH state 
last time the node was scheduled, and W the belief state space. 
We denote by 7r(s) the probability that a node is scheduled if 
it is in state s, by p(s) the steady state probability of state s, 
and by ps.s(o) the state transition probability function from 
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State s to s if action a € {0,1} is taken, where a = 1 if 
the node is scheduled in this TS, and a = 0, otherwise. The 
optimization problem is 


max 

7r(s).p(s) 


R{s)Tr{s)p{s) 

sew 


s.t. p{s) = Y P(s)[(l - 7 r(s))p 5 .s( 0 ) + 7 r(s)p 5 ,s(l)], 
sew 

Y '^{s)p{s) = —, and Y 

sGW sGW 

(16) 

where 0 < 7 r(s), p{s) < 1, and R{s) is the expected 
throughput of a node if it is in state s. Note that the node 
is scheduled every ^ TSs on average. This implies that, for 
p = 1 , the maximum time a node can be idle is finite, and 
hence, the state space W is finite. If p 7 ^ 1 , one can truncate 
the state space by bounding the maximum time a node can be 
idle, i.e., imposing that I is bounded. The problem (fTbl l has 
a linear objective function and linear constrains, and the state 
space is finite, therefore it can be solved in polynomial time 
with LP. 


VII. Numerical Results 

In this section we numerically study the performances of 
different scheduling policies for the general case described in 
Section [III] In particular, we consider MP which is optimal 
for the cases studied in Sections |IV] and |V] the RR policy, 
which schedules the nodes in a cyclic fashion according to 
an initial random order, and a random policy, which at each 
TS schedules K random nodes regardless of the history. We 
measure the performance of the scheduling policies as the 
average throughput per TS over a time horizon of T = 1000, 
that is, we consider /3 = 1 and normalise (|2]i by T. We perform 
100 repetitions for each experiment and average the results. We 
assume, unless otherwise stated, a total of iV = 30 EH nodes, 
K = 5 available channels, and a probability p = 0.5 for a 
node to be operative in each TS. We assume that all the nodes 
and EH processes are symmetric, the batteries have a capacity 
of i? = 5 energy units, and the transition probabilities of the 
EH processes are pn = poo = 0.9. Notice that, on average, 
each node is scheduled every ^ TSs. Hence, if ^ is large the 
nodes remain idle for larger periods. This implies that when 
is large, since the nodes harvest over many TSs without being 
scheduled, there are more energy overflows in the system. In 
the numerical results we have included the infinite horizon 
upper bound of Section IVll which for large T is a tight upper 
bound on the finite horizon case. 

In Eigure |2(a)| we investigate the impact of the number 
of nodes on the throughput, when the number of available 
channels, K, is fixed. The throughput increases with the 
number of nodes, and due to the battery overflows, saturates 
when the number of nodes is large. By increasing the battery 
capacity, hence reducing the battery overflows, the throughput 
saturates with a higher number of nodes and at a higher value. 
We observe that MP has a performance close to that of the 
upper bound, the random policy has a lower performance than 
the others; and the gap between different curves increases with 
the battery capacity. 


In Eigure |2(b)| we investigate the effect of the battery 
capacity, B, on the system throughput when the number of 
nodes is fixed. Clearly, the larger the battery capacity the fewer 
battery overflows will occur. The throughput increases with 
the battery capacity, and due to the limited amount of energy 
that the nodes can harvest, it saturates at a certain value. By 
increasing the number of available channels, K, which also 
reduces the battery overflow, the throughput saturates more 
quickly as a function of the battery capacity, but at higher 
values. The performances of the scheduling policies are similar 
to those observed in Eigure |2(a)| 




Figure 2. (a) Average throughput vs. number of nodes, N, with K = b 

channels, and battery capacity = 3, 5,10, and (b) average throughput vs. 
battery capacity, N = 30, and K = l,b, 10. 


Eigure [3] shows the average throughput for different EH 
process transition probabilities. We note that the amount of 
energy arriving to the system increases with pn and decreases 
with Poo- As expected, we observe in Eigure [3 that the 
throughput increases with pn, and the values in Eigure [3(a)] are 
notably higher than those in Eigure [3(b)| MP is a policy which 
maximises the immediate throughput at each TS, and does not 
take into account the future TSs. We observe in Eigure |3(b)| 
for B = {5,10} and in Eigure |3^ for B = 10 that, if the EH 
state has low correlation across TSs, that is, pn = {0.5, 0.6}, 































Figure 3. Average throughput for different EH process transition probabili¬ 
ties, AT = 30, = 5, and S = 3, 5,10. 

the throughput obtained by MP is similar to that of the upper 
bound. On the contrary, if it has high correlation across TSs, 
that is pii = {0.8, 0.9}, the throughput falls below the upper 
bound. This is due to the fact that when the state transitions 
have low correlation it is difficult to reliably predict the impact 
of the actions on the future rewards, and no transmission 
strategy can improve upon MP. Our numerical results indicate, 
that even in scenarios in which the MP cannot be shown to 
be theoretically optimal, it performs very close to the upper 
bound, obtained for an infinite horizon problem. 

VIII. Conclusions 

We have studied a scheduling problem in a multi-access 
communication system with EH nodes, in which the harvested 
energy at each node is modeled as a Markov process. We 
have modeled the system as an RMAB problem, and shown 
the optimality of MP in two settings: i) when the nodes 
cannot harvest energy and transmit simultaneously and the EH 
process state is independent of the past states after a node is 
active; ii) when the nodes have no battery. The results of this 
paper suggest that although the optimal scheduling in large 
EH networks requires high computational complexity, in some 
cases there exist simple and practical scheduling policies that 


have almost optimal performance. This can have an impact on 
the design of scheduling policies for large low-power wireless 
sensor networks equipped with energy harvesting devices and 
limited storage. 

Appendix A 

We denote the probability that the battery of a node is not 
full if the node has been idle for the last n TSs by pnf{n). It 
is easy to note that pnf{n) is a decreasing function of n. If 
the node has been idle for n TSs, we denote the probability 
of the EH process being in state 0 and 1, by po{n) = pio + 
Po{n - l)(pii -poi) and pi{n) = 1 - po{n), respectively. 
We set po(0) = Cq. Since pn > poi and eo < Po{n) 

monotonically increases to the steady state distribution ( Il 22 l 
Appendix B]). 

We denote the belief state of a node that has been idle for n 
TSs by Zn- If the node has been idle for n+1 TSs, the expected 

battery level is Zn+i = T{zn) = Zn + (PoiPoin) -f 

PiiPi{n)), which is a monotonically increasing function. If 
n > m, then Zn > Zm and T(zn) > T{zm)- By applying 

the definition of pi{n), we get z„+i = Zn + (pii — 

Po{i^){Pii ~Pqo))- If we assume that n > m, we have 

\\r{zn) - T{zm)\\ = Zn - Zm + (pn -po(n)(pii -poi)) 

- (pii-po(m.)(pii -Poi)) 

< Zn-Zm- (pii -poi) (po (n) -po ijn)) 

^ Zn Zm , 

where the first inequality follows since Pnf(n) < pnf{m), and 
the second inequality follows since p[){n) is monotonically 
increasing and pn > poi- 

Appendix B 

The proof uses backward induction. We denote by 5? and 

S¥ the nodes scheduled from 8° and 8°, respectively. We 

n _ n n ^ 

first observe that (uOb holds for n = T. This follows from 

the bounded regularity of R{-), noting that u{T) = 1, and 
distinguishing four possible cases. 

• Case 1: j G S¥ and 7 G S¥, i.e., node 7 is scheduled in 
u u n n 

both cases. 

Wt(so) - Wt(so) 


=i?(S0(i),...,S7,... 





..,0, 


®n(i)’ • ■ •! 1; • ■ • 1 



• ■ • 1 0,..., 


= isj-s^){R{s^^y 

■ • ■! 1) • • • 1 '®n(ir)^ 


—R(so 0 

^ n(i)’ ’ ’ 

<{sj - Sj)Auu(T), 




where the second equality follows from the decom- 
posability of R{-). Since R{-) is symmetric and the 
belief vectors are equal but for node j, we have 
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k,..., which we use in the third equality. Finally, 

the inequality follows from the boundedness of R{-). 

• Case 2: 7 4 S¥ and 7 4 S¥, i.e., node 7 is not scheduled 
n n 

in either case. The same nodes with the same beliefs are 

scheduled in both cases, hence, = s?, and Wt(so ) — 

n n 

= 0 . 

■ Case 3: 7 G and 7 4 S¥. In this case there exists a 
. n ^ n 

node m G such that s,- > Sm > s-j, and m 4 

n n 

Wt (so ) - 1Ft(so ) 


= (Sj -Sm)(R(So^^y. 


■ • ’ ®n(ic) 



)) 




—R{so 0 . 

^ n(i)’ ’ ’ 

’ ®n(^) 

)) 

<{sj - Sj)Auu{T), 




where the first equality follows similar to Case 1, the 
second equality from the fact that Sm > Sj, and the last 
inequality from the boundedness of i?( ). Note that node 
m is the node with the highest belief state that is not 
scheduled in WT(sg), and the node with the lowest belief 
state scheduled in 14^7’(s^). 

• Case 4: j ^ S¥ and j G S¥. This case is not possible 

since the vectors s° and §0 are ordered and s, > s,, 
n n J — 0’ 

hence, if Sj is scheduled then Sj must be scheduled too. 

Now, we assume that (fTOl i holds from TS n + 1 up to T, and 
show that it holds for TS n as well. We distinguish three cases; 

• Case 1: 7 G S¥ and 7 G S¥ in (fTsT l. i.e., node 7 is 

n n 

scheduled in both cases. The first and second summations 

in the first line of (fTM correspond to the cases in 

which node j G S¥ is idle and active, respectively, 

in TS n. Similarly, lirst and second summations in the 

second line of (II Sab correspond to the cases in which 

node j G S¥ is idle and active, respectively, in TS n. 

Note that the belief state vector includes the belief 

states of all the nodes in so, but those in £ and Sj, 

n 

hence, it is equivalent to the belief state vector 
We use this fact to get (IlSbb . Note that the belief state 
vectors in dlSbb differ only in the belief states of node 
j, namely, T{sj) and T{sj) are the beliefs of node j in 
vectors [T(sg), 0 (|f |)] and [T(sg), 0 (|f |)], respectively; 
and hence, we use the induction hypothesis in the sum¬ 
mation of (IlSbb to obtain (IlScb . The summation in (IlScb 
is over all possible operative/inoperative combinations of 
the nodes in S¥\{j}, and it is equal to one. This fact 
together with the boundedness and the decomposability 
of R{-) are used in (IlScb to get (IlSdb . The contracting 
property of t(-), and the definition of u{n) are used in 
(IlSeb and (IlSfb . respectively. 

• Case 2:j 4 S¥ and 7 4 S¥, i.e., the same nodes are 

^ n ^ n 

scheduled from s = and s <>, and node 7 is not scheduled 
n n’ ■' 

in either case. Then 


= /35^g(|f|,iT)(lT„+i([T(s^),0(|f|)]) 


scsS^ 

n 

-VF„+i([T(%)0(|f|)])) (19a) 

< - Sj)Pu{n + 1) (19b) 

T-n-l 

< Auisj - Sj)pY^ {I3{l-p)y (19c) 

< Au{sj - Sj)u{n), (19d) 


where (fT^ follows since the value of the expected 
immediate rewards in TS n are the same. The belief state 
vectors at TS n + 1 are equal but for the belief state of 
node j, that is, T{sj) and T{sj) are the beliefs of node 
j in T(sg) and T(s^), respectively. In (I19ab . similarly 
to (IlScb . (IlSdb . and (IlSeb . we apply the induction hy¬ 
pothesis, the contracting map property, and the fact that 
the summation is equal to one, to obtain ( I19bb . We use 
/3 < 1 and the definition of u{n) to obtain (fT^ and 
(I19db . respectively. 

• Case 3: j G <S? and j ^ S? in (|20] |. i.e., there exists 
TO G iSF such that s, > Sm = Sm > Si and that to 4 

J T- 

S¥. Hence, S¥ and 5? differ only in one element. To 
obtain (I20ab we use the symmetry property of the pseudo 
value function and the fact that the belief vectors are equal 
but for node j; in (I20bb we add and subtract a pseudo 
value function, which has two nodes with the same belief 
state Sm, and one is scheduled while the other is not. 
We can group the pseudo value functions, and apply the 
results of Case 1 and Case 2 above. In particular, for the 
pseudo value functions in the first line of (I20bb . the belief 
vectors are equal but for Sj and Sm, moreover j G <S? 

and TO G 5?, and s, > Sm, so we can apply the results of 
n 

Case 1. Similarly, for the two pseudo value functions in 
the second line of (I20bb we can use the results of Case 2. 

Appendix C 

We note that set S = {1,... ,K} is the set of K nodes 
scheduled from s, and that the set is the set of nodes 
scheduled from sn, that is, the first K nodes as ordered 
according to permutation If. We only need to study the cases 
in which S and are different, since the claim holds for 
the others due to the symmetric property of the pseudo value 
function, (1^. We study the case j € S, i G , i ^ S, 
and j ^ in d^Tb . The summation in (12lab is over all 
operative/inoperative combinations of the nodes in 5\{j}. We 
denote the belief state of all nodes but those in £ and Sj by 
The belief state of node i in TS n -f 1, r(si), is in 
vector T(sj^). Similarly, the belief state of node j in TS 
n -I- 1, T{sj), is in vector T(sj^). The second pseudo value 
functions in the first and second lines in ( 12 lab cancel out, and 
(I21bb is obtained. We have applied the decomposability and 
boundedness of R{-) to obtain (I21cb . Belief vectors T(sg;jj) 
and T(sjj^) in ( I21cb are ordered and only differ in one 
element, r(si) and T(sj), respectively, where r (Si) < t(Sj), 
and hence, we use Lemma |2] to get (I21db ; ( I21eb follows since 
t(-) is a monotonically increasing contracting map, dm since 
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= i?(sf) + (l-p)/35^g(|f|,i^-l)H^„+i([r(s^),0(|f|)])+p/3^g(|f|,if-l)W^„+i([r(s^),0(|f|+l)]) 

£^S^\{j} ^C5f\{i} 

n n 

- i?(if)-(l-p)/3^g(|£:|,A-l)M^„+i([T(s£),0(|£:|)])-p/3^g(|f|,X-l)T/F„+i([r(sjoj),0(|f|+l)]) (18a) 


£QS^\{j} £QS^\{3} 

n n 

= A(sf) - A(sf) + (l-p)/3^g(|£:|,if-l)(w^„+i([T(s3.),0(|f|)]) - M^„+i([T(%), 0(|f |)])) (18b) 

£QS^\{j} 

n 

< A(sf) - i?(ip + (l-p)/3 ^ q{\£\,K-l)(^Au{T{sj)-T{sj))uin + 1)^ (18c) 

£QS^\{j} 

n 

< A„(sj - Sj) + (1-p)/3A„(t(sj) - T{sj))u{n + 1) (18d) 

< Au(sj - Sj) + {l-p)/3Au{sj - Sj)u{n + 1) (18e) 

T-n-1 

< A4s,-s,)(l + ^(l-p)^ (/3(1-P)r) (18f) 

= Au{sj - Sj)u{n), (18g) 


. Sj , 



- W„(s^(i),...,s,.„,s°(^^i),. 

. . , Sj,... 

’ ^n(Ar)^ 





■ • ? £m: ■ • 


■ ’ '®n(if)’ ■ 

. . ,Sj, . . 

■ ’ ®n(Af)^ 

(20a) 



■ • ■ • 


■ ’ '®n(ic)’ ■ 

■ • 7 ■ • 



+ l^n(s°(l),.. 


• ■ 5 • ■ 


■ ’ ^n(K)’ ■ 

..,Sj,.. 


(20b) 

^ ^u{_£j £m 

)u{n) + Au(Sm - 

s^)u(n) 





(20c) 


= Au{sj — Sj)u(n). (20d) 


Wn(s)-Wn(sn) 

= R(s^) - R(s ^) + ^ E ^-1) [pWn+i ([T(S^), 0(|f 1 + 1)]) + (1 - p)l+„+i ([T(S^), 0(|f I)]) 

-pVFn+i([T(s^),0(|£:| + l)])-(l-p)l+„+i([T(sg),0(|f|)])) 

= i?(s^)-i?(s^)-p/3^g(|f|,i^-l)(iy„+i([T(s^),0(|f| + l)])-l+„+i([T(sj^),0(|f| + l)])) 
£QS\{j} 

> Ai{s,-Si)-pPY. g(|f|,i^-l)(M^„+i([T(s^),0(|£:| + l)])-l+„+i([T(s^),0(|£:| + l)])) 
£<£S\{j} 


> Ai{sj 

> Ai{sj 

> Ai{sj 

= {sj - 


-Si)-pl3^ (q{\£\,K-l)AuiT{sj) 
£QS\{3} 

- Si) -pPAuisj - Si)u{n + 1) 

- Si) - p/ 3 Au{sj - Si)u(O) 




> 0 


T{si))u{n + 1)^ 


(21a) 

(21b) 

(21c) 

(21d) 

(21e) 

(21f) 

(21g) 


u{n) is decreasing in n; finally (21gi follows since m( 0) is the 
sum of a geometric series. 


Appendix D 

We again use backward induction. Lemma |4] holds trivially 
for n = T. Note that in (I15ab the set of nodes sched- 
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uled in the pseudo value functions W„(sq) and Wn(s) are 
{1,..., — 1, iV} and K}, respectively. That is, node 

K is scheduled in W„(s), but not in W„(sjj); and node N is 
scheduled in VF„(sjj), but not in VF„(s). To prove that (I15ab 
holds at TS n we use a sample path argument similarly to 
ini, and assume that the realizations of the EH processes of 
nodes K and N are either 0 or 1. There are four different 
cases, but here we only consider one, since the others follow 
similarly. 

We consider the case in which the EH processes have 
realizations Ej^{n) = 1 and Efj{n) = 0. We denote by 
/C = {l,...,iT — 1} the set of nodes scheduled in both sides 
of (I15ab . If £ is the set of active nodes, we denote the set 
of nodes in JC that remain idle by /C* = JC\£. We denote 
the nodes that are not scheduled in either side of (I15ab by 
IC‘‘ = 1C U {K,N}. We denote the set {0,by Erom 
the left hand side of (I15ab we obtain (|22]) . where in (I22cb we 
have applied the induction hypothesis of (I15ab . the symmetry 
of the pseudo value function, the inequality pn > poo, and 
the definition of R{-). This concludes the proof of (I15ab . 

Now we prove the second part of Lemma |4] (I15bb . There 
are three cases: 

• Case 1: j,i < K, i.e., nodes j and i are scheduled on 
both sides of (I15bb . The inequality holds since the pseudo 
value function is symmetric. 

• Case 2: j < K and i > K in (|2^ . i.e., nodes 

i and j are scheduled on the left and right hand 
sides of dlSbb . respectively. To prove the inequal¬ 
ity we use the linearity of the pseudo value func¬ 
tion (fT4l) . Since Sj > Si, using (fT4b . we only 
need to prove that ..., 1,..., 0,..., sat) — 

W„(si,..., 0,..., 1,..., Sat) > 0. We denote the 

scheduled nodes in both sides of (fT5bb by /C = 
{!,... ,iT}\{j}, the set of nodes in 1C that remain idle 
by KC = IC\£, and the nodes that are not scheduled in 
either side of (I15bb by /C® = /C U {j,i}- We denote the 
belief vector (si ,... ,Sj = 1,..., Si = 0,... ,s_n) by s, 
its i, j-swap by sn, and define = (si,..., Sk)- In (l2^ 
have used the induction hypothesis of (I15bb and (I15ab in 
(|23bl) and (|2^ . respectively, and the fact that ^ < 1. 

• Case 3: nodes Sj and Si are not scheduled. Inequality 
holds in this case, by applying the definition of (fljb and 
the induction hypothesis of (I15bb . 
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1 + Wn{sN, 5i, . . . , SN-l) 

= 1 + -R(sf) +/3X! X! pW^n+1 (Pii (S^f) ,'r(sK:0)Si^ = Pii,r{sK.s),SN =poi,Poi {'^k)) 

£Qic ZfgBi^l 

+ (1 — p)Wn+i (Pii (S/f), SN = poi,'I'(sk:Oi ■Sif = Pii7'^(s/C'»)) Poi i^k)) (22a) 

>p + i?(s~) +/3^ Hk,K-l) pWn+l (Pll (S/f) ,T{s!ci),SK = Pll,T{SfCs),SN =Poi,Poi (^k)) 

£QIC ZfgBi^l 

+ (1 -p)(l + ^n+1 (Pii i^k),SN =poi,T{sici),SK = pii,t{sjcs),Poi )) (22b) 

> p + E(s~ ) +/3^ ^ h{ls,K-l) pWn+1 (Pll {T,Ie) ,t{s,c'),sk = pii,t{sjcs),sn =poi,Poi i^k)) 

S^K. 

+ (1 -p)Wn+i (Pll {^k) ,r{sK.i),SK =pii,t(sk;0,Poi (SZg) ,sn = Poi) (22c) 

= i?(s^) +/3^ ^ /l(/£,i4:-l) pWn+1 (Pll {T,Is),Sk =Pll,T{s^i),T{s,ck^SN =poi,Poi i'^k)) 

£<ZK Z^eBlEI 

+ (1 — p)Wn+l (Pll (S/f) ,T{sici),SK = P 11 ,t{sics)^sn = poi, Poi {^k)) (22d) 

= T^n(s) (22e) 


Wn{s) =i?(s^)+/ 3 ^ ^ h{l£,K-l) pWn +1 {Pll{T,l£) ,Sj = pii,T{Sici),T{sicsai),Poi i^k)) 

£QIC 

+ {l-p)Wn+l (Pll (E/f) ,T(SKiuj),T(s/c<.Ui),Poi {^k)) 
>R{s^)-p + p'^ ^ h{ls,K-l) p[l + Wn+l (Pll(EZf),Si =poi,T(s^i),T(sK;.Uj),Poi (E/f 

+ (1 -p)V14+i (Pll (E/f) ,T(sKiui)7'r(sK:-uj),Poi {^k)) 

> R{s^) -p + / 3 ^ ^ h{lE,K-l) pWn +1 (Pll (E^f) ,T(s^i),T(sK:«uj),Poi (EZb) , Si =Poi) 

fiCK /£6BI£| 

+ (1 -p)Wn+i (Pll (E/e) ,T(sKiui),'r(s/c-uj),Poi (S/e)) 

= ^^^(sn) 


(23a) 


(23b) 


(23c) 

(23d) 
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